16-726 Learning-Based-Image-Synthesis

Yiwen Zhao's Project Page

Assignment #1 - Colorizing the Prokudin-Gorskii Photo Collection

Background

Sergei Mikhailovich Prokudin-Gorskii (1863-1944) traveled across the vast Russian Empire and take color photographs of everything he saw in 1907. He used an amazing technique which record three exposures of every scene onto a glass plate using a red, a green and a blue filter. This plates are envisioned to be integrated to color image by a projector and share with others.

His RGB glass plate negatives, capturing the last years of the Russian Empire, survived and were purchased in 1948 by the Library of Congress. The LoC has recently digitized the negatives and made them available on-line.

Here is LOC's processing method.
  1. The entire plate is reduced to 8-bit grayscale mode.
  2. Under magnification, the quality of each image on the plate is reviewed for
    • contrast,
    • degree of color separation,
    • extent of damage to the emulsion,
    • and any other details that might affect the final color composite.
  3. The scan of the entire plate is aligned and the outside edges are cropped.
  4. Use anchors to further align channels.
  5. Crop the overlapped image to only retain the area which three layers share in common.
  6. The cropped color composite is adjusted overall to create
    • the proper contrast,
    • appropriate highlight,
    • shadow details,
    • optimal color balance.
  7. Final adjustments may be applied to specific, localized areas of the composite color image to minimize defects associated with over or underexposure, development, or aging of the emulsion of Prokudin-Gorskii’s original glass plate.

Motivation

Design an algorithm to automatically align the 3 channels.

Naive Approach

The simplest inplementation cut the plate and overlaped the three parts, which has an obvious misplacement (or aliasing), calling for further alignment.

img overlap

Pixel Alignment

Initially, I used pixels to align the channels.

I use the single scale matching first. It works well in cathedral.jpg, which is probably the most easy sample because it is in compressed format. However, the searching speed for .tif files slows down drastically.

To cut the time cost in 1 minute, I use the pyramid image scale with a flexible structure up to 5 levels, using sk.transform.rescale. The number of levels depends on the input image size.

img pyramid

Here lists some of my observations in Prokudin-Gorskii Photo Collection. I use them as basic assumptions to design the algorithm.

  1. Border is in a dark, frame-like structure. The black and white borders of each channel differ. The inborn mismatching makes these borders ineffectual, even detrimental in alignment. Thus, they should be excised at the outset.

    img mismatch_flaw_borders

    I designed an automatic algorithm to search the border using dark pixel ratio in scan windows.

    img auto_edge
  1. Three channels only have a small misplacement in initialization, which makes sure most of the image region can be used to calculate loss.

These assumptions highly depend on the Prokudin-Gorskii Photo Collection domain characteristics. In other words, they may not be transferable to other domains, which is a pity.

In some test cases, pixel matching is enough to output a 'OK' result.

image1

R (303, 185) G (294, 294)

Project 2

R (89, 7) G (40, 6)

Project 2

R (114, 8) G (55, 6)

Project 2

R (111, 1) G (49, 1)

Project 2

R (124, 5) G (60, 6)

I opted for np.roll to move the R and G channels as aligned as they could to the B channel. One thing worth mentioning is that, in the loss calculation, those moved pixels shouldn't be counted in. Say a channel is move up-left, then a small rectangle will appear in the down-right corner, which has no reason to be count.

img np_roll

I chose the NCC loss. It remove the effect of brightness by np.linalg.norm(). For a 2D input, this operation is defined by dividing the input by AF, the Frobenius norm.

AF=i=1mj=1n|aij|2

Furthermore, due to the np.roll operation, the NCC should be normalized by the number of pixels used for calculating loss. (Only calculating the dot product is ok for inputs in a unanimous size, but not reasonable for inputs in various sizes).

On the contrary, SSD is not suitable to be averaged to each pixel, thus less flexible on various input sizes.

However, there are several hard cases in pixel matching. I think one reason is that they have a large portion of R/G/B color content.

img large_portion

For instance, if a 3-channel red circle is divided into R/G/B grayscale images, the G/B channels might lack significant information while the R channel contains dense pixels, thereby complicating pixel search.

Edge Alignment

Fortunately, many edge features are shared across channels. I initially employed the Sobel kernel, effective in detecting vertical and horizontal lines. Implementing gradient magnitude notably improved the reconstruction of previous failure cases.

Project 1

R (160, 373) G (77, 8)

Project 1

R (173, 10) G (76, 8)

Project 1

R (111, 4) G (2553, 2509)

Project 1

R (112, 4) G (54, 5)

Project 1

R (2374, 1044) G (2230, 1143)

Project 1

R (137, 7) G (64, 4)

Project 1

R (0, 251) G (49, 7)

Project 1

R (106, 10) G (49, 7)

Project 1

R (2522, 2069) G (2486, 560)

Project 1

R (86, 9) G (41, 1)

left: Pixel Search -- right: Edge Search

I also tried Canny kernel, but in emir.tif case it even preform worse than pixel search. Althought Canny can detect detailed edges and curves better than Sobel, these details are not shared by all channels. In contrast, Sobel focus on channel-unanimous straight (black) lines, resulting in better outcomes. Consequently, I chose Sobel for edge detection.

img amplify img amplify

left: Sobel Kernel -- right: Canny Kernel

 

Nonetheless, there is still some visible mismatching between channels. The blue artifacts are obvious here.

img amplify img amplify

 

One possible reason is the presence of animals (including humans), flowing water, or other movable objects in the image. As the images from the three channels of different filters were captured sequentially, natural mismatches may occur between these elements.

Another reason is that, after visualization, the edge features are not exactly the same across channels.

img amplify

Hue Correction

I use the method purposed in Color Transfer between Images to correct the hue of reconstructed images, since the filters used by Prokudin-Gorskii are not exactly the standard RGB.

The key is to find an orthogonal color space. The space transfer can be summarized as RGB -> LMS -> logLMS -> lαβ -> logLMS -> LMS -> RGB. Hue correction is conducted in lab space, where the average of the chromatic channels α and β are shifted to zero. The average for the achromatic channel is unchanged, because there is no need to change the overall luminance level. The standard deviations should also remian unaltered.

before correction

Before Correction

after correction

After Correction